This repository contains implementations of model-free on-policy single-agent reinforcement learning algorithms.

MDP Generation Classes:
Env.py and utils.py

Algorithms Implemented:
FedQ_EarlySettled_simple.py: Our proposed Q-EarlySettled-LowCost algorithm (with M=1)
QHoeffding.py: UCB-Hoeffding algorithm
QHoeffdinglow.py: UCB2-Hoeffding algorithm
QBernstein.py: UCB-Bernstein algorithm
QBernsteinlow.py: UCB2-Bernstein algorithm
Qadv.py: UCB-Advantage algorithm
Qearly.py: Q-EarlySettled-Advantage algorithm

Experimental Setup:
All the experiments in this subsection are run on a server with Intel Xeon E5-2650v4 (2.2GHz) and 100
cores. Each replication is limited to a single core and 8GB of RAM. The total execution time is about 5 hours.
Two test configurations are provided:
532single.py: Code submitted to the server for (H,S,A) = (5,3,2) 
7105single.py: Code submitted to the server for (H,S,A) = (7,10,5)
These configurations include all necessary hyper-parameter values for the experiments.